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ABSTRACT 


This  final  report  summarizes  the  findings  of  research  on  the  topic  "Stochastic 
Dynamic  Systems  with  Multiple  Decision  Makers  and  Parametric  Uncertainties",  sup¬ 
ported  by  a  Grant  from  the  Air  Force  Office  of  Scientific  Research,  during  the  period  May 
1,  1985  -  April  30,  1988.  The  focus  of  the  research  during  this  three-year  period  has  been 
on  the  development  of  methodologies  and  new  solution  techniques  for  obtaining  strategies 
in  stochastic  systems,  with  good  sensitivity  properties,  and  for  deriving  optimal  decision 
rules  in  systems  with  nonclassical  information  patterns.  A  further  major  thrust  has  been 
on  the  development  of  learning  schemes  and  distributed  algorithms  for  multiple 
decision-maker  problems  under  different  types  of  uncertainty. 
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1.  Multiple  Decision-Maker  Problems  with  Unknown  Parameters 

The  problem  of  strategic  decision  making  in  complex  systems  which  involve  multiple 
decision  makers  (DM's),  multiple  objectives,  and  incomplete  information  arises  frequently 
in  the  military  context.  As  compared  with  single  DM  problems,  the  analysis  of  multiple 
DM  problems  requires  different  approaches  and  techniques,  and  furthermore  certain  stan¬ 
dard  features  and  properties  we  usually  ascribe  to  single  DM  problems  do  not  generally 
extend  naturally  to  multiple  decision  making.  For  example,  while,  in  single  DM  prob¬ 
lems,  optimization  (minimization  or  maximization)  of  a  single  objective  functional  would, 
in  general,  lead  to  a  satisfactory  decision  policy  (the  so-called  optimal  policy),  when  the 
decision  problem  involves  multiple  DM’s  and  multiple  objectives  a  plethora  of  possibili¬ 
ties  emerge  as  to  the  criterion  which  leads  to  a  "satisfying"  set  of  policies.  Depending  on 
the  number  of  DM’s,  their  underlying  goals,  and  the  presence  or  absence  of  dominance  in 
the  decision  making  process,  we  may  have  team-optimal ,  person-by-person  optimal ,  Pareto 
optimal,  Nash  equilibrium,  Stackelberg  (leader-follower)  equilibrium,  consistent  conjectural 
variations  equilibrium  concepts,  and  several  variants  of  combinations  of  these  in  case  of 
more  than  two  DM’s.  Each  of  these  leads,  in  general,  to  a  different  outcome  which  is  also 
a  variant  of  the  information  structure  of  the  problem  (i.e.,  what  each  DM  knows  a  priori, 
what  information  he  acquires  during  the  evolution  of  the  decision  process,  what  informa¬ 
tion  exchange  links  are  allowable,  and  what  information  transmission  capability  each  DM 
is  vested  with).  The  significance  of  information  structure  in  multiple  DM  problems  also 
manifests  itself  in  the  derivation  of  multimodel  strategies:  Model  simplification  through 
singular  perturbations  or  aggregation  is  not  a  well-posed  procedure  unless  there  is  some 
kind  of  a  matching  between  the  information  structures  of  the  original  problem  and  the 
simplified  version;  no  such  inconsistencies  arise,  however,  in  single  decision-maker 
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problems. 

Recent  years  have  witnessed  considerable  advances  in  our  understanding  of  equili¬ 
brium  solutions  of  deterministic  and  stochastic  multiperson  decision  problems,  and  in  par¬ 
ticular  as  regards  the  Stackelberg  equilibrium  solution.  A  class  of  such  Stackelberg  prob¬ 
lems  which  were  long  thought  to  be  extremely  challenging  have  been  solved  using  indirect 
methods,  for  both  deterministic  and  stochastic  systems.  In  some  cases  it  has  been  shown 
that  the  Stackelberg  equilibrium  strategy  for  the  leader  forces  the  DM’s  at  lower  levels  of 
hierarchy  to  a  team  behavior,  jointly  optimizing  the  leader’s  performance  index,  even 
though  they  may  each  have  different  goals  and  performance  indices.  In  other  cases,  tight 
performance  bounds  have  been  obtained  on  the  leader’s  cost  function,  which  are  achiev¬ 
able  by  implementable  policies. 

A  large  majority  of  this  work  on  multiple  DM  problems  pertains  to  either  deter¬ 
ministic  systems  or  to  systems  with  uncertain  elements  which  have  a  complete  probabilis¬ 
tic  description— this  a  priori  information  being  known  by  all  the  DM’s  (the  latter  class  of 
problems  are  also  known  as  stochastic  dynamic  games).  Hence,  even  though  some  decen¬ 
tralization  of  dynamically  acquired  information  has  been  allowed  for  in  the  general  for¬ 
mulation  of  dynamic  games,  it  has  been  a  common  assumption  to  endow  every  DM  with 
the  common  (centralized)  a  priori  information  regarding  the  complete  statistical  descrip¬ 
tion  of  the  "primitive"  random  variables.  Our  thesis  has  been  that  such  an  underlying 
assumption  is  not  always  a  realistic  one,  especially  when  the  decision  problem  involves 
distributed  tasks  for  the  DM’s.  A  more  realistic  formulation,  in  most  cases,  would  allow 
for  discrepancies  in  the  perceptions  of  the  DM’s  regarding  the  underlying  stochastic 
model.  These  discrepancies  could  be  accommodated  in  the  model  by  having  a  number  of 
parameters  which  are  either  not  stochastic  or  are  stochastic  but  their  complete  statistical 
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description  is  not  known  by  all  the  DM’s. 

The  presence  of  unknown  Cor  uncertain)  parameters  could  affect  the  general  problem 
formulation  in  basically  three  different  ways: 

i)  Through  the  objective  functions.  Here,  the  objective  function  of  the  i’th  DM  may  not 
be  known  completely  by  the  j’th  DM  (j^i),  with  the  uncertainty  characterized  by  a 
number  of  parameters  whose  values  are  unknown  to  the  j’th  DM. 

ii)  Through  the  system  response.  The  evolution  of  the  decision  process  may  depend  on  a 
number  of  parameters  whose  values  are  unknown  to  some  or  all  DM’s.  [This  type  of 
uncertainty  is  also  applicable  to  stochastic  team  problems.] 

iii)  Through  the  measurements  made  by  the  DM’s.  Here  either  the  observation  scheme  or 
the  statistics  of  some  of  the  variables  in  the  measurement  process  of  a  DM  (or  both) 
may  not  be  known  to  some  other  DM.  with  the  uncertainty  again  being  parameter¬ 
ized.  [As  in  ii)  this  type  of  uncertainty  is  also  applicable  to  stochastic  team  prob¬ 
lems.] 

Multiple  DM  problems  with  the  types  of  uncertainties  as  described  above  can  be 
treated  by  adopting  essentially  one  of  the  following  three  approaches: 

a)  Robustness  or  Minimum  Sensitivity  Approach.  Here  one  assumes  some  nominal  values 
for  the  unknown  parameters,  determines  a  corresponding  nominal  performance  for 
the  system,  and  designs  decision  policies  which  would  lead  to  minimum  performance 
degradation  should  the  parameters  vary  around  their  nominal  values.  The  resulting 
decision  policies  are  called  minimum  sensitivity  strategies,  and  they  are  robust  in  a 
certain  neighborhood  of  the  nominal  values.  For  some  recent  advances  in  this  area 
and  for  motivation  of  this  approach  we  refer  to  publications  [Pl],[P2]  and  [Pll], 
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which  report  our  recent  work  on  this  topic,  carried  out  under  AFOSR  support. 

b)  Learning  Schemes.  In  this  approach  no  nominal  values  for  the  unknown  parameters 
will  be  available,  but  some  a  priori  statistics  may  be  attached  to  these  parameters  by 
the  DM’s,  which  will  be  updated  in  a  decentralized  manner  as  new  dynamic  informa¬ 
tion  is  acquired.  This  is  akin  to  some  of  the  methodologies  developed  earlier  for  con¬ 
trol  problems  with  unknown  parameters  (such  as  identification,  parameter  estima¬ 
tion,  and  adaptive  control-still  active  research  areas),  which  are,  however,  not  appli¬ 
cable  to  multiple  DM  problems,  because  the  rather  intricate  interactions  of  multiple 
DM’s  render  any  central  learning  scheme  inf easible.  The  iterative  schemes  which  are 
needed  for  such  systems  have  to  be  decentralized  and  distributed,  and  have  also  to 
account  for  the  possibility  that  DM’s  may  not  update  their  policies  or  actions  in  a 
predetermined  order.  There  are  the  f urther  questions  of  robustness  of  these  schemes 
to  possible  inaccuracies  in  the  computation  phase  during  each  update,  and  robustness 
to  environmental  changes.  Learning  schemes  could  involve  two  types  of  iterations: 
"iteration  in  the  policy  space"  and  "iteration  in  the  decision  (action)  space".  During 
the  past  three  years  we  have  devoted  considerable  attention  to  the  former,  under 
AFOSR  support;  our  accomplishments  in  this  area  are  discussed  in  some  detail  in  the 
next  section.  Furthermore,  we  have  made  considerable  progress  on  the  second  type  of 
iteration  referred  to  above,  as  also  elucidated  in  the  next  section. 

c)  Minimax  Approach.  Here  no  nominal  values  are  available  for  the  unknown  parame¬ 
ters,  but  they  are  known  to  belong  to  some  pre-specified  sets.  Then,  the  objective  is  to 
design  strategies  which  would  carry  optimality  or  equilibrium  property  under  worst 
possible  values  of  the  parameters  on  these  sets.  [See,  for  example,  [P9l  and  [P10J  for 
two  different  contexts  where  such  a  formulation  would  arise.]  Such  an  approach 
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entails  a  pessimistic  design  philosophy,  and  is  applicable  mostly  to  decision  problems 
with  a  common  objective  functional  (i.e.,  team  problems).  In  multi-objective  prob¬ 
lems,  the  minimax  philosophy  may  lead  to  some  ambiguity,  since  what  may  seem  to 
be  a  worst-case  design  for  one  objective  functional  may  seem  to  lose  this  property 
when  tested  against  a  different  objective  functional.  However,  if  different  objective 
functionals  are  affected  by  different  sets  of  unknown  parameters,  this  approach 
would  still  be  applicable.  Furthermore,  in  some  information  transmission  problems 
where  the  channel  description  is  not  complete,  the  minimax  design  (transmission) 
philosophy  finds  a  natural  home,  as  elucidated  particularly  in  the  recent  paper  [P23]. 

We  should  point  out  that  a  combination  of  any  two  or  all  three  of  the  above 
approaches  would  also  constitute  a  viable  approach  to  multi-person  decision  problems 
with  unknown  parameters,  which  should  be  studied  in  proper  contexts  once  the  rudi¬ 
ments  of  a  theory  for  each  one  separately  is  laid  down. 


2.  Research  Accomplishments 

In  our  proposals  for  this  research,  carried  out  under  AFOSR  support  during  the  past 
three  years,  we  had  recognized  the  f act  that  the  class  of  multiple  DM  problems  with  unc¬ 
ertain  parameters,  as  described  above,  are  still  in  their  infancy,  in  particular  under  the 
"Learning  Scheme"  approach,  and  when  the  underlying  information  patterns  are  nonclassi- 
cal.  In  view  of  this,  we  proposed  to  conduct  original  fundamental  research  to  make 
theoretical  advances  in  this  field,  of  both  methodological  and  algorithmic  nature,  and  to 
design  implementable  decision  policies  which  carry  both  the  learning  and  command  capa- 
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bilities.  To  accomplish  this,  we  proposed  to  adopt  the  general  f ramework  of  deterministic 
and  stochastic  dynamic  games,  and  to  conduct  a  study  under  the  three  types  of  uncer¬ 
tainty  introduced  in  Section  1.  using  different  solution  concepts  such  as  team-optimal, 
Nash  equilibrium  and  Leader- Follower  (hierarchical).  We  placed  emphasis  on  the 
development  of  decentralized  and  distributed  schemes  with  learning  capabilities,  and  also 
proposed  to  study  the  issues  of  optimal  and  suboptimal  strategy  design  in  stochastic  con¬ 
trol  and  team  problems  of  the  nonstandard  type,  especially  those  with  nonclassical  infor¬ 
mation  patterns,  so  as  to  enhance  our  understanding  of  the  intricate  role  played  by  infor¬ 
mation  patterns,  active  and  passive  learning,  and  hierarchies  in  such  problems. 

During  the  past  three  years,  we  have  addressed  several  challenging  issues  in  this  con¬ 
text,  and  have  made  important  strides.  We  provide  below  a  brief  summary  of  our 
research  findings,  full  details  of  which  can  be  found  in  the  references  listed  in  Section  3. 
Copies  of  all  the  references  with  publication  dates  of  May  1987  or  later  are  attached  (in 
full)  to  this  report;  for  publications  of  the  two  previous  years,  we  attach  to  this  report 
only  a  selected  number  of  them  (those  with  asterisks),  since  they  were  all  submitted  ear¬ 
lier  along  with  the  two  previous  progress  reports. 

We  now  return  to  brief  descriptions  of  the  main  contributions  of  the  papers  listed  in 
Section  3.  In  the  first  group  of  papers,  listed  in  Section  3  as  [Pl]-[P3l.  we  have  adopted  the 
first  (i.e.,  minimum  sensitivity)  approach  for  a  class  of  decision  problems  which  displayed 
the  first  type  of  uncertainty,  viz.  the  case  of  one  of  the  DMs'  cost  f unction  depending  on  a 
number  of  parameters  whose  precise  values  are  known  by  him  but  not  by  otherfs).  In 
[PI],  we  have  presented  a  general  mathematical  formulation  and  a  method  of  solution  for 
stochastic  incentive  decision  problems,  using  concepts  and  tools  of  dynamic  game  theory. 
As  special  cases  of  the  general  formulation  we  have  considered  four  different  classes  of 
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problems  which  differ  in  the  information  available  to  the  DM’s,  their  objectives,  and  the 
number  of  DM’s  at  different  levels  of  hierarchy.  The  fourth  class  we  considered  can  be 
viewed  as  an  "exact  model  matching"  problem  akin  to  the  one  arising  in  nonlinear  control. 
In  the  paper,  an  explicit  incentive  policy  has  been  obtained  for  the  DM  occupying  the 
higher  level  in  the  hierarchy,  which,  besides  solving  the  exact  matching  problem,  carried 
very  appealing  minimum  sensitivity  properties.  These  features  have  also  been  demon¬ 
strated  in  [PI]  in  the  context  of  a  numerical  example.  In  [P2]  we  have  extended  these 
results  to  more  general  models,  involving  decision  problems  defined  on  finite  dimensional 
spaces.  In  [P3]  we  have  considered,  stochastic  decision  problems  with  two  levels  of  hierar¬ 
chy,  and  N>1  DM’s  (followers)  occupying  the  lower  level.  We  showed  that  if  the 
leader’s  dynamic  information  comprises  only  a  linear  combination  of  the  followers’ 
actions,  he  can  design  a  policy,  affine  in  this  dynamic  information,  which  yields  the  same 
overall  performance  as  the  one  the  leader  would  obtain  had  he  observed  the  followers' 
actions  separately.  This  is  a  feature  intrinsic  to  stochastic  problems  and  have  no  counter¬ 
part  in  deterministic  systems.  In  the  paper  we  have  presented  explicit  solutions  and 
existence  conditions  for  the  case  of  finite  probability  spaces,  and  have  identified  several 
challenging  issues  when  the  random  variables  are  defined  on  infinite  spaces.  In  more  recent 
work,  reported  in  [PI  l],  we  extended  the  results  of  [P2]  to  multi-stage  decision  processes 
where  the  objective  functionals  of  the  DM's  depend  on  a  time- varying  uncertain  parame¬ 
ter.  We  have  obtained  strategies  that  use  the  past  values  of  the  state  measurements  (i.e. 
memory),  which  desensitize  the  performance  against  variations  of  the  uncertain  quantities 
about  their  nominal  values. 

The  next  two  papers,  [P4]  and  [P5],  address  a  different  class  of  problems,  where  the 
uncertainty  is  of  the  second  and  third  types  (see  Section  1),  and  the  general  approach  is 
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the  "learning  scheme";  here,  three  solution  concepts,  viz.  Nash,  hierarchical,  and  Pareto- 
optimal,  are  employed.  While  the  discussion  in  [P4]  pertains  to  two-person  decision  prob¬ 
lems,  the  sequel  [P5]  is  devoted  to  the  general  N-agent  case.  The  analyses  cover  both  finite 
and  infinite-state  models,  where  the  uncertainties  are  in  the  statistical  description  of  the 
random  variables  appearing  in  the  system  dynamics  and  the  measurements  of  the  two 
DM’s,  and  the  DM’s  are  allowed  to  develop  different  prior  probabilities  on  these  random 
variables.  The  papers  develop  different  recursive  schemes  which  involve  "learning  in  the 
policy  space",  and  lead  to  policies  that  converge  to  the  equilibrium  under  different  stipula¬ 
tions  on  the  information  structure  of  the  problem.  We  have  also  analyzed  the  robustness 
and  sensitivity  of  team  optimal  solutions  to  deviations  in  the  perceptions  of  the  DM’s 
from  a  common  stochastic  model,  and  have  shown  that  adoption  of  the  Nash  equilibrium 
solution  leads  to  well-posed  models,  whereas  the  other  two  solution  concepts  lead  to 
bif urcation  once  deviated  f rom  the  nominal  model.  An  important  by-product  is  a  conver¬ 
gent  algorithm  which  yields  the  optimal  solution  of  a  quadratic  stochastic  team  problem 
with  decentralized  information,  in  which  the  underlying  statistics  are  not  Gaussian. 
Counterparts  of  these  results  in  the  case  of  n-person  continuous-time  stochastic 
differential  games  have  been  obtained  more  recently  in  [P12]  where,  as  before,  we  have 
allowed  for  uncertainties  in  the  statistical  description  of  the  random  variables  appearing 
in  the  system  dynamics  and  the  measurements  of  the  n  decision  makers  (players).  Furth¬ 
ermore,  the  players  were  allowed  to  develop  multiple  (possibly  inconsistent)  probabilistic 
models  for  the  underlying  system  (state  dynamics  and  measurement  equations).  We 
obtained  conditions  for  the  existence  and  uniqueness  of  Nash  equilibrium,  and  developed  a 
method  for  iterative  distributed  computation  of  the  solution.  The  distributed  algorithm 
presented  in  [PI 2]  involves  learning  in  the  policy  space,  and  it  does  not  require  that  each 
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player  know  the  others’  perceptions  of  the  probabilistic  model  underlying  the  decision 
process.  For  the  finite  horizon  problem,  such  an  iteration  converges  whenever  the  length  of 
the  time  horizon  is  small,  and  the  limit  in  this  case  is  an  affine  policy  for  all  players,  if  the 
underlying  distributions  are  Gaussian.  When  the  horizon  is  infinite,  and  a  discount  f actor 
is  used  in  the  cost  functionals,  the  iteration  converges  under  conditions  depending  on  the 
magnitude  of  the  discount  factor,  the  limiting  policies  again  being  affine  in  the  case  of 
Gaussian  distributions. 

Papers  [P8]  and  [PI 3]  deal  with  the  development  of  distributed  computational 
schemes  for  nonlinear  nonquadratic  multi-person  decision  problems.  One  of  the  impor¬ 
tant  results  is  the  derivation  of  a  general  condition  (called  persistent  contraction )  under 
which  a  number  of  iteration  schemes  in  two  DM  problems  converge  to  the  desired  equili¬ 
brium,  wh.'n  the  cost  functionals  are  nonquadratic  and  the  DM’s  do  not  necessarily  adopt 
the  same  model.  The  algorithms  developed  use  both  accurate  and  inaccurate  search  tech¬ 
niques  in  the  policy  space,  and  apply  to  discrete-time  as  well  as  continuous-time  decision 
problems.  A  number  of  numerical  examples  included  in  the  two  papers,  as  well  as  in  the 
thesis  [Tl],  illustrate  different  features  of  these  algorithms,  and  in  particular  their 
superiority  over  both  Newton  and  gradient  type  algorithms. 

The  iterative  schemes  studied  in  [P8]  and  [PI 3]  all  lead  to  an  equilibrium  solution 
provided  that  it  is  stable.  Roughly  speaking,  we  say  that  an  equilibrium  solution  of  a 
zero-sum  or  a  nonzero-sum  game  is  stable  if,  after  any  deviation  from  that  equilibrium, 
an  adjustment  process  that  involves  unilateral  optimal  responses  by  the  players  can  bring 
it  back  to  the  starting  point.  One  appealing  f eature  of  a  stable  equilibrium  is  that  in  the 
on-line  adjustment  process  each  DM  (or  player)  need  to  know  only  his  own  cost  function 
and  the  most  recently  computed  (and  broadcast)  policies  of  the  other  players,  and  not  the 
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other  players’  cost  functions.  It  is  needless  to  say  that  not  all  (saddle-point  or  Nash)  solu¬ 
tions  are  stable,  and  hence  the  question  arises  as  to  whether  there  exists  a  different  on-line 
(real-time  implementable)  computational  algorithm,  than  those  discussed  in  [P8]  and 
[P13],  which  would  converge  to  an  equilibrium  even  if  that  equilibrium  is  not  stable.  In 
[PI 4]  we  have  addressed  precisely  this  question,  and  have  introduced  a  relaxation  tech¬ 
nique  which  leads  to  on-line  implementable  algorithms  that  converge  to  equilibria,  be 
they  stable  or  not,  and  in  some  cases  in  a  finite  number  of  steps.  In  the  paper,  we  have 
also  obtained  conditions  for  the  convergence  of  asynchronous  algorithms,  which  arise  in 
the  computation  of  equilibria  in  games  where  the  order  of  responses  is  not  fixed  a  priori . 
The  discussion  and  analyses  are  confined  primarily  to  two-person  deterministic  problems, 
with  extensions  to  n-person  games  and  stochastic  games  identified  as  challenging  problems 
for  future  research. 

The  next  paper  in  the  list,  [PI 5],  addresses  the  second  type  of  iteration  and  learning 
scheme  mentioned  in  Section  1:  "iteration  in  the  decision  (action)  space".  Here  our  model  is 
a  network  of  processors  (DM’s)  connected  by  partial  communication  links  and  engaged  in 
distributed  computation.  They  receive  information,  make  decisions  based  on  the  informa¬ 
tion  in  their  buffer,  and  transmit  the  decision  to  a  subset  of  other  DM’s,  all  at  random 
points  in  time.  In  the  paper,  we  present  results  on  the  convergence  and  asymptotic  agree¬ 
ment  of  a  general  class  of  asynchronous  algorithms  which  arise  in  this  context.  These 
algorithms  are  in  general  time-varying,  memory -dependent,  and  not  necessarily  associated 
with  the  optimization  of  a  common  cost  functional.  We  obtain  the  precise  conditions 
under  which  convergence  to  a  unique  set  of  decisions  and  asymptotic  agreement  on  the 
shared  information  can  be  reached  by  distributed  learning.  It  is  shown  that  a  separation 
of  fast  and  slow  parts  of  the  algorithm  is  possible,  leading  to  a  separation  of  the  estima- 
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tion  process  f rom  the  computation  phase.  These  results  are  obtained  using  random  con¬ 
traction  mapping  arguments  and  under  various  meaningful  assumptions  on  the  sizes  of 
buffers  for  the  DM’s  to  store  the  information  they  receive.  More  precisely,  convergence  is 
established  even  if  the  memory  of  each  processor  is  bounded,  whereas  asymptotic  agree¬ 
ment  is  achieved  when  there  are  no  limitations  imposed  on  the  memory  and  computa¬ 
tional  capabilities  of  the  processors. 

As  we  have  mentioned  earlier,  one  of  the  most  challenging  problems  in  multiple- 
person  decision  making  is  the  problem  of  obtaining  optimal  solutions  to  stochastic  teams 
with  nonclassical  information.  In  a  proper  context,  these  problems  may  also  be  referred  to 
as  "stochastic  control  problems  with  nonclassical  information”.  One  of  the  most  impor¬ 
tant  and  mostly  referenced  works  in  stochastic  control  is  the  1968  paper  by 
Witsenhausen,  where  a  counterexample  was  given  to  refute  the  common  belief  that  all 
LQG  (linear  quadratic  Gaussian)  control  problems  admit  linear  solutions.  For  a  scalar 
example  with  nonclassical  information  pattern,  it  was  shown  that  the  best  linear  solution 
can  be  outperformed  by  a  nonlinear  solution,  and  more  importantly  that  the  derivation  of 
the  best  nonlinear  solution  (which  exists)  is  a  most  challenging  task,  even  numerically. 
Almost  twenty  years  have  passed  since  then,  and  the  question  of  the  best  solution  for 
that  specific  problem  is  still  open.  There  has  also  remained  the  further  important  question 
of  the  extent  of  validity  of  the  features  displayed  in  that  paper  for  the  general  class  of 
stochastic  control  or  team  problems;  in  other  words,  are  all  stochastic  team  problems  with 
nonclassical  information  patterns  inherently  difficult  and  complex?  In  [P16]  we  have  shed 
some  light  on  these  questions,  and  have  obtained  some  f undamental  results.  Specifically, 
we  have  considered  a  parameterized  class  of  two  agent  team  problems  with  strictly  nonc¬ 
lassical  dynamic  information,  which  also  includes  the  earlier  formulation,  and  in  this 
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class  we  have  identified  a  subclass  which  admits  a  linear  solution.  When  we  go  outside 
this  subclass,  we  show  that  the  best  linear  solution  can  be  outperformed  by  a  linear  com¬ 
bination  of  linear  and  piecewise  constant  control  laws,  and  we  re-interpret  the  basic  result 
of  Witsenhausen  in  this  more  general  framework.  Hence,  we  have  a  complete  partitioning 
of  the  parameter  space  into  two  regions,  in  one  of  which  the  optimal  solution  is  linear,  and 
in  the  other  it  is  inherently  nonlinear,  but  a  piecewise  constant  type  control  law  alone 
does  not  always  improve  upon  the  best  linear  law.  In  proving  the  optimality  of  linear 
laws  for  the  first  class,  we  have  adopted  a  unique  approach  and  have  used  some  results 
from  information  theory. 

In  [PI 7],  we  have  extended  the  findings  of  [P16]  in  new  important  directions. 
Specifically,  we  consider  in  that  paper  a  stochastic  dynamic  team  problem  with  two  con¬ 
trollers  and  nonclassical  information,  which  can  be  transmitted  as  the  transmission  of  a 
garbled  version  of  a  Gaussian  message  over  a  number  of  noisy  channels,  under  a  given 
fidelity  criterion.  We  show  that  the  optimal  solution  (under  a  quadratic  loss  functional) 
consists  of  linearly  transforming  the  garbled  message  to  a  certain  (optimal)  power  level, 
and  then  optimally  decoding  it  by  using  a  linear  transformation  at  the  receiving  end.  The 
optimum  power  level  alluded  to  above  is  determined  by  the  solution  of  a  fifth-order  alge¬ 
braic  equation.  The  paper  also  discusses  an  extension  of  this  result  to  the  case  when  the 
channel  noise  is  correlated  with  the  input  random  variable,  and  it  shows  that  the  solution 
is  again  linear  for  the  single  channel  case. 

These  results  on  two  stage  stochastic  teams  with  nonclassical  information  have  sub¬ 
sequently  been  used  in  our  research  towards  developing  a  general  theory  for  multi-stage 
(finite  and  infinite  horizon)  stochastic  control  and  team  problems  with  nonclassical  infor¬ 
mation,  where  the  control  (decision)  variable  does  not  only  affect  the  state  trajectory  but 
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also  the  quality  of  information  available  to  the  decision  makers,  thus  exhibiting  a  dual 
role.  Two  papers  which  report  research  results  in  that  direction  are  [P181  and  [P19], 
which  deal  with  two  totally  different  classes  of  stochastic  decision  problems  with  nonc- 
lassical  information,  and  obtain  explicit  solutions  in  both  cases  using  two  totally  different 
approaches. 

In  [P18]  we  consider  a  stochastic  dynamic  decision  problem  where  at  each  step  two 
consecutive  decisions  must  be  taken,  one  being  what  information  bearing  signal  to 
transmit,  and  the  other  what  control  action  to  exert.  Such  a  problem  arises  in  the  simul¬ 
taneous  optimization  of  both  the  observation  and  the  control  sequences  in  stochastic  sys¬ 
tems,  and  is  a  prime  example  of  a  stochastic  team  problem  with  nonclassical  information. 
Using  bounds  from  information  theory,  we  were  able  to  solve  this  problem  completely  for 
first-order  systems  under  a  quadratic  cost  criterion.  We  have  shown  in  [P18]  that,  in  the 
case  of  hard  power  constraints,  the  optimal  measurement  policy  consists  of  transmitting 
the  "innovation"  in  the  new  data  at  the  maximum  power  level.  In  the  case  when  the  power 
levels  at  the  transmitter  are  not  fixed,  the  optimal  power  levels  for  transmitting  this 
innovation  can  be  found  by  solving  a  nonlinear  optimal  control  problem.  These  results  are 
further  extended  in  [P20]  to  the  case  when  the  time  horizon  is  infinite  and  the  cost  func¬ 
tional  is  discounted,  in  which  context  we  prove  the  existence  of  optimal  stationary  control 
and  transmission  strategies,  and  provide  a  complete  characterization  of  the  optimal  solu¬ 
tion.  An  extension  in  a  different  direction  is  provided  in  [P21],  where  we  study  stochastic 
teams  with  (i)  more  than  one  DM  (agent)  who  performs  the  communication  task  of  gen¬ 
erating  information  bearing  signals,  and  (ii)  more  than  one  agent  performing  the  control 
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The  problem  treated  in  [PI 9]  is  another  stochastic  dynamic  optimization  problem 
which  exhibits  active  learning,  but  the  derivation  of  its  solution  requires  a  completely 
different  approach.  The  proof  of  optimality  given  in  the  paper  relates  the  original  single 
objective  problem  to  a  sequence  of  nested  zero-sum  stochastic  games.  Existence  of  saddle 
points  for  these  games  implies  the  existence  of  optimal  policies  for  the  original  stochastic 
control  problem,  which,  in  turn,  can  be  obtained  from  the  solution  of  a  nonlinear  deter¬ 
ministic  optimal  control  problem.  The  paper  also  studies  the  problem  of  existence  of  sta¬ 
tionary  optimal  policies  when  the  time  horizon  is  infinite  and  the  objective  function  is 
discounted.  This  is  one  of  the  first  reports  in  the  literature  on  the  derivation  of  closed- 
form  solutions  for  a  class  of  (non-neutral)  stochastic  control  problems  where  the  control 
directly  affects  the  quality  of  information  carried  to  future  stages.  In  a  subsequent  work 
reported  in  [P22]  we  have  provided  a  different  perspective  to  the  results  of  [P18l  and  [P19] 
by  introducing  a  unifying  framework  and  also  analyzing  the  contrasts  between  the  for¬ 
mulations  and  the  method  of  solutions  in  the  two  papen. 

In  IP23],  we  expand  on  the  f  ramework  of  [P16]  and  [P17]  by  allowing  incomplete  sta¬ 
tistical  information  on  some  of  the  variables  and  seeking  an  optimal  solution  under  a 
worst  case  analysis.  We  again  operate  under  nondassical  information  patterns  and  con¬ 
sider  a  number  of  cases  depending  on  whether  there  are  "hard"  energy  constraints  or  "soft* 
constraints  on  some  decision  variables  and/or  "soft”  costs  on  communications.  We  obtain 
minimax  decision  rules  in  all  these  cases,  some  being  saddle  points  and  others  not,  the 
techniques  of  derivation  being  very  much  case-dependent.  Further  results  on  this  general 
class  of  problems  have  recently  been  reported  in  [P24],  where  the  optimal  (saddle-point) 
solution  dictates  the  use  of  a  probabilistic  decision  rule.  These  are  all  important  prototype 
problems  which  could  be  considered  essential  building  blocks  for  a  general  theory  of 
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multi-stage  distributed  decision  making  under  nonclassical  information,  and  with  partial 
statistical  description. 

Two  other  papers  in  the  list,  which  deal  with  the  minimax  philosophy  are  [P9]  and 
[PlO].  In  [P9]  we  address  the  problem  of  designing  time-invariant  controllers  for  stochas¬ 
tic  systems  with  parametric  uncertainties,  under  both  the  minimax  and  minimum  sensi¬ 
tivity  approaches.  We  first  introduce  some  applicable  theory,  and  then  develop  some 
numerical  algorithms  to  obtain  both  the  minimax  and  minimum  sensitive  controllers, 
under  different  assumptions  on  the  order  of  the  controllers.  A  comparative  analysis  of 
the  numerical  results  displays  some  interesting  features  of  the  solution,  which  are  dis¬ 
cussed  at  length  in  the  paper.  In  [PlO]  we  obtain  a  fundamental  result  for  stochastic  deci¬ 
sion  problems  with  unknown  parameters  under  a  worst  case  design  philosophy.  For  a 
fairly  general  class  of  sequential  decision  processes,  we  show  that  even  in  the  absence  of  a 
saddle  point,  the  min-max  strategy  can  be  obtained  by  means  of  a  dynamic  programming 
type  recursion.  In  addition  to  proving  this  theorem,  we  also  examine  the  precise  roles  of 
the  strategy  sets  allowed  to  the  minimizer  and  the  maximizer  in  determining  the  min-max 
value. 

The  sixth  paper  in  the  list,  [P6],  addresses  a  decentralized  large  scale  decision  (team) 
problem  with  N  OM’s,  and  introduces  a  novel  procedure  to  obtain  suboptimal  policies 
with  appealing  features.  It  utilizes  the  method  of  chained  aggregation  to  decompose  the 
overall  team  problem  into  (N+l)  subproblems:  one  low  order  team  problem  with  a  cen¬ 
tralized  information  structure  and  N  decentralized  optimal  control  problems.  Accord¬ 
ingly,  the  control  of  each  DM  is  decomposed  into  three  components:  a  decoupling  control 
which  induces  aggregation,  a  local  control  which  controls  the  subsystem  dynamics,  an 
aggregate  control  which  controls  the  dynamics  of  the  interconnection  variables.  The  paper 
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also  establishes  the  robustness  of  this  composite  control  with  respect  to  perturbations  in 
the  system  dynamics  and  the  cost  functional. 

The  seventh  paper,  [P7],  is  a  tutorial,  based  on  an  opening  talk  given  by  the  PI  at  a 
conference  in  London  in  June  1985,  and  it  presents  in  a  nutshell  rudiments  of  the  theory 
of  multi-person  decision  making  in  a  dynamic  environment.  The  paper  does  not  only  sur¬ 
vey  the  literature,  but  also  introduces  a  mathematically  rigorous  definition  for  a  new 
solution  concept  —  consistent  conjectural  variations  equilibrium.  This  solution  opens  up 
new  challenging  problems  in  this  area,  some  of  which  are  identified  in  the  paper.  In 
another  tutorial  paper,  [C3],  which  is  an  opening  plenary  talk  given  at  an  IF  AC  Workshop 
in  Beijing  in  August  1986,  we  provide  a  survey  on  the  optimum  design  of  organizations 
with  decentralized  information,  and  identify  a  number  of  challenging  issues  with  regard 
to  existence  of  smooth  incentive  schemes  and  their  robustness. 

Finally,  the  two  recent  papers  [P26]  and  [P27]  study  and  develop  a  new  approach  for 
policy  optimization  problems  that  involve  the  so-called  "forward-looking"  stochastic 
models.  Such  models  provide  a  characterization  of  decision  processes  where  the  evolution 
of  the  underlying  dynamics  depends  explicitly  on  the  expectations  the  controlling  agents 
form  on  the  future  evolution  itself.  They  lead  to  nonstandard  stochastic  dynamic  optimi¬ 
zation  problems  where  one  has  to  take  into  account  the  fact  that  there  is  a  circular 
(closed)  relationship  between  future  forecasts  and  the  future  system  behavior.  In  [P26] 
we  study  the  class  of  models  where  the  only  input  involves  a  two-step  ahead  prediction 
of  the  future  system  behavior,  by  formulating  them  as  stochastic  control  problems  (of  the 
delayed  information  type)  in  both  finite  and  infinite  horizons.  It  is  shown  that  when  there 
is  perfect  state  information,  the  solution  is  unique  for  both  the  finite  and  infinite  horizon 
formulations,  and  it  requires  memory  for  the  former  while  requiring  only  current  state 
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information  for  the  latter.  When  only  noisy  measurements  are  available,  it  is  shown  that 
a  certainty-equivalence  type  result  holds,  the  memory  requirements  being  the  same  as 
above,  with  perfect  state  now  replaced  by  the  output  of  a  finite-dimensional  filter.  The 
second  paper,  [P27],  extends  these  results  to  forward-looking  models  which  have  two 
types  of  controlled  inputs  —  a  forecasting  strategy  and  a  tracking  strategy.  The  objective 
of  the  second  control  input  is  to  make  the  system  track  a  given  trajectory.  This  leads  to  a 
game-theoretic  formulation,  which  we  thoroughly  study  in  [P27]  for  both  finite  and 
infinite  horizons,  and  under  both  perfect  and  noisy  state  measurements.  In  all  cases  we 
show  that  the  problem  admits  a  unique  Nash  equilibrium  solution,  and  provide  a  complete 
characterization  of  the  corresponding  decision  rules. 
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